Inferring language change from computer corpora: Some methodological problems1

نویسنده

  • Janet Holmes
چکیده

As the number and size of computer corpora grow, linguistic researchers are increasingly using them to study changes in language over time. Comparing usage at one point in time with usage at a later or an earlier period seems a stunningly simple and Sausurreanly impeccable method of studying language change. Needless to say the reality is rather different. This paper identifies some of the methodological problems encountered in using computer corpora to describe changes in sexist usages in New Zealand English (NZE) over a twenty-five year period.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending the possibilities of corpus-based research on English in the twentieth century: a prequel to LOB and FLOB

This paper explains the rationale for a new corpus being assembled at Lancaster University to complement the existing Brown ‘family’ of corpora; that is, English language corpora modelled on the original Brown University corpus, such as LOB, Frown, FLOB, Wellington, etc. The purpose of the new corpus, called Lancaster1931, is to extend the chronological span of these corpora into the first half...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Choices over time : methodological issues in investigating current change 1

The fact that English is changing is immediately apparent to a modern reader of, say, 18th or 19th century literature, or indeed to a teenager speaking to an elderly relative. However, as Mair (2006) points out, anecdotal evidence for linguistic change is unreliable. The systematic study of language change requires large, evenly balanced, and reliably annotated corpora with texts sampled over a...

متن کامل

Combining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms

Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We have made use of minimal linguistic resources, such as basic morphological tagging and phrase chunking, to demonstrate that verb subcategorizati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994